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NONPARAMETRIC ESTIMATION OF SURFACE INTEGRALS 

By Raul Jimenez^ and J. E. Yukich^ 

Universidad Carlos III de Madrid and Lehigh University 

The estimation of surface integrals on the boundary of an un- 
known body is a challenge for nonparametric methods in statistics, 
with powerful applications to physics and image analysis, among 
other fields. Provided that one can determine whether random shots 
hit the body, Cuevas et al. [Ann. Statist. 35 (2007) 1031-1051] es- 
timate the boundary measure (the boundary length for planar sets 
and the surface area for 3-dimensional objects) via the considera- 
tion of shots at a box containing the body. The statistics considered 
by these authors, as well as those in subsequent papers, are based 
on the estimation of Minkowski content and depend on a smoothing 
parameter which must be carefully chosen. For the same sampling 
scheme, we introduce a new approach which bypasses this issue, pro- 
viding strongly consistent estimators of both the boundary measure 
and the surface integrals of scalar functions, provided one can collect 
the function values at the sample points. Examples arise in experi- 
ments in which the density of the body can be measured by physical 
properties of the impacts, or in situations where such quantities as 
temperature and humidity are observed by randomly distributed sen- 
sors. Our method is based on random Delaunay triangulations and 
involves a simple procedure for surface reconstruction from a dense 
cloud of points inside and outside the body. We obtain basic asymp- 
totics of the estimator, perform simulations and discuss, via Google 
Earth's data, an application to the image analysis of the Aral Sea 
coast and its cliffs. 

1. Introduction. The estimation of functionals defined on the boundary 
r of an unknown body G C M'^ is a new branch of nonparametric statistics 
with powerful apphcations in several areas, including image analysis and 
stereology [9]. Cuevas et al. [8] address the estimation of the Minkowski 
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fi : = being the Lebesgue measure on M and i?e(x) the closed d-dimensional 
ball with center at x and radius e. When the limit (1.1) exists, it is the most 
basic measure of the content of F and it coincides with length and surface 
area in dimensions 2 and 3, respectively [16]. Minkowski content estimators 
are based on random point samples distributed on a d-dimensional rect- 
angular solid containing G, for which one may determine whether a point 
is in G or not. Roughly speaking, they are empirical measures of the e- 
approximation 



of the Minkowski content of T. Both the statistic considered by Cuevas et 
al. [8] and other closely related statistics [2, 10, 18] depend on the smoothing 
parameter e, which must be chosen as a function of the size of the random 
point sample. 

We propose a different nonparametric approach, free of smoothing param- 
eters, to estimate not only boundary lengths of planar sets and surface areas 
of solids, but also surface integrals of those scalar functions whose values are 
knowable at the sample points. While this paper focuses on instances where 
the body G is unknown, the proposed method is also of interest for bodies 
having a known but complex boundary, one having a surface integral defying 
traditional estimators. 

The nonparametric estimation of surface integrals has practical applica- 
tions in image analysis, when the body G consists of some nonhomogeneous 
material and the density h{x) of the material at x is collected at the sam- 
ple points. Thus, one might, for example, be interested in the mass per 
unit thickness of T, which corresponds to the surface integral commonly 
represented by J-phdT. In some instances, quantities such as temperature 
and humidity can be measured by sensors randomly distributed on a set, in 
which an unknown body is embedded, and a fundamental problem is to es- 
timate the surface integral of these quantities on the boundary of the body. 
In other situations, including those arising in medical imaging, oncology and 
cardiology, knowledge of boundary length is of importance in the prognosis 
of an infarction, as well as in the assessment of the dissemination capacity 
of a tumor, as explained in Cuevas et al. [8]. 

Our statistics are based on the Delaunay triangulation of the sample 
points. Although the idea can be easily adapted to other graphs, in par- 
ticular to the Voronoi diagram, we chose the Delaunay triangulation for two 
specific reasons: 
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1. It is a well-known tool in curve reconstruction methods. In particular, 
the /^-skeleton [15], the Crust [1] and a wide range of related algorithms 
involve computing Delaunay triangulations. 

2. It is computationally efficient — for example, Boissonnat and Cazals [5] 
report a 3-dimensional Delaunay triangulation code which can handle 
500,000 randomly distributed points per minute. 

The basic difference between previous methods and the one introduced 
here is that formerly used methods estimate a fattened boundary, namely 
IJ^gP-Be(x), whereas we directly estimate F by a surface (a curve if d = 2) 
properly selected among the polyhedra (polygons if d = 2) whose faces be- 
long to the Delaunay triangulation. Part of our methodology involves a 
new algorithm for surface reconstruction based on inner and outer sampling 
points, which differs substantially from the numerical methods for surface 
reconstruction in which the sample points are on (or close to) the surface 
to be reconstructed. Our method is described in Section 2, where we intro- 
duce the relevant statistics. Section 3 establishes basic asymptotics of these 
statistics and provides a strongly consistent estimator of the surface inte- 
gral. In Section 4, we perform a simulation study. In particular, we estimate 
the Minkowski content of sets, comparing our method with existing ones. 
An application to image analysis of the Aral Sea coast and its cliffs from 
Google Earth's data is discussed in Section 5. Section 6 summarizes our 
conclusions. Our proofs, given in Section 7, rely on point process methods, 
including weak convergence of point processes and stabilization methods, a 
tool for establishing general limit theorems for sums of weakly dependent 
terms in geometric probability [3, 19-21]. 

2. The method: Sewing boundaries of unknown sets. Following the basic 
assumptions of [8, 10], we will assume that G is a compact subset of an open 
and bounded d-dimensional rectangular solid Q and that the closure of the 
interior of G has positive //-measure. The boundary of G will be denoted by 
r, that is, 

r:={x: for any e > 0, Be{x) D G ^ and B^ix) DG" ^ 0}, 

with G^ := Q\ G. It is assumed that the ^-boundary of G, defined by 

(2.1) {x: for any e >0,/i(Be(x)nG) >0 and /x(5^(x)nG^) >0}, 

coincides with F. This rules out the existence of "extremities" to G having 
null d-dimensional Lebesgue measure. Thus, if we randomly plot enough 
points in Q, there will be points inside and outside G close enough to any 
point on the boundary F. We assume that F is a (d— l)-rectifiable set. That 
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is, there exists a countable collection of continuously differentiable maps 
Qi/.W^-^ such that 



'H'^^^ being the (d— l)-dimensional Hausdorff measure [16]. We will assume 
that r has finite Hausdorff measure and that T has tangent spaces which 
are defined almost everywhere. 

As in [8, 10], the sampling model consists of n i.i.d. random variables 
Xi, . . . ,Xn, uniformly distributed on Q, and n i.i.d. Bernoulli random vari- 
ables 6i, . . . ,6n such that 



In other words, although G is unknown, we know whether a sample point 
Xk is inside G or not. In addition, given measurable /i: Q — t- M, we assume 
that we know the values h(Xi), . . . , h{Xn); in general, h is unknown on its 
domain, but we assume that we are able to collect its value at each of the 
n sample points. Our goal is to estimate the surface integral of h, formally 
defined by 



For sets T satisfying a stronger notion of (d — l)-rectifiability, namely if T 
is the Lipschitz image of a subset of M'^, the integral (2.4) coincides (up 
to a constant factor) with the Minkowski content of T when h = 1. Thus 
the estimated target quantities in [2, 8, 10, 18] can be written as a surface 
integral (2.4) with h{'y) = 1. This brings the issues and problems of [2, 8, 10, 
18] within the compass of this paper. 

Denote by Af„ the set of sample points {Xi, . . . ,Xn} and let T>[Xn) be 
the Delaunay triangulation for X^, namely the full collection of simplices 
(triangles when d = 2) with vertices in X^ satisfying the empty sphere cri- 
terion, namely that no point in Xn is inside the circumsphere (circumcircle 
if d = 2) of any simplex in P(Af„). For any sample of absolutely continu- 
ous i.i.d. points on M*^, there exists a unique Delaunay triangulation almost 
surely [17]. Each simplex s € T>{Xn) is represented by a subset of {d + 1) ver- 
tices belonging to Xn, here denoted by {X^^^), . . . Recalling (2.3), 
we introduce the sewing of F, denoted by S{Xn,T), and defined by 



(2.2) 




(2.3) 




if Xk G G, 
if Xk i G. 



(2.4) 
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Fig. 1. Polar rose with polar coordinate equation p= |sin(5^) (black); Delaunay tessel- 
lation for 10'' i.i.d. points, uniformly distributed on the square [—1,1]^ (green and red). 
Red triangles are the sewing of the rose. 

Thus, S{Xn,T) is the coUection of shnphces in T){Xn) with at least one 
vertex in G and at least one vertex in C^. We may assume without loss of 
generality that there are no sample points on F. In this case, the sewing of 
r consists of the simplices in the triangulation which intersect the boundary 
of G. As an illustration. Figure 1 shows the sewing of a polar rose with five 
petals [with polar coordinate equation = | sin(50)] based on a sample of 
10^ points. 

We are particularly interested in the following two surfaces (curves if 
d = 2) contained in 5(Af„,r): 

The inner sewing, here denoted by S~ {Xn,^)-, which consists of the union 
of all faces of the simplices of S{Xn,T) having vertices in G. 

The outer sewing, here denoted by 5^(A'„,r), which consists of the union 
of all faces of the simplices of S{Xn,T) having vertices in G'^. 

To be more precise, let s be a simplex of 5(Af„,r) and / a face of s, where, 
here and henceforth, by "face" we mean a simplex of dimension {d — 1). 
Every such face / may be represented by a vertex set of size d, denoted by 

(2.5) V(/):={X;(i),...,X;(,)}. 

A face / of s is in the inner sewing 5~(A'„,r) if and only if s G 5(Af„,r) 
and V(/) C G. The face itself need not lie wholly in G. On the other hand, 
a face / of s is in the outer sewing S^{Xn-, T) if and only if s € S{Xn-, F) and 
V(/) C G"^; again, the face need not lie wholly in G'^. Both the inner and 
outer sewing consist of polyhedral surfaces (polygons \{ d = 2) which can be 
used to estimate F. In Figure 2, we show the inner and outer sewing of the 
polar rose in Figure 1, this time based on samples with 10^ and 10^ points. 
As one would expect, both sewings fit the rose when n is large. 
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Now that we have a way to estimate the surface T by samphng surfaces, 
we are ready to explore estimators for the surface integral at (2.4). Any 
numerical approximation of the integral involving either the inner or the 
outer sewing is a potential estimator of the integral on T. In this work, 
we approximate integrals by the trapezoidal rule. Thus, if h is continuous 
and bounded, the surface integral Jjhdf of any face / G S{Xn,^) can be 
approximated by 




V(/) being the vertex set of / defined at (2.5). Here, H.'^^^if) is the length 
of an edge if d = 2, the area of a triangle if d = 3 and the volume of a 
{d — l)-dimensional simplex otherwise. Therefore, the surface integrals of h 
with respect to the inner and outer sewings can be approximated by the 
sums 

(2.6) I-{KT):= ^'"'^f^-di E M^'^)) 
and 

(2.7) /+(/.,r):= ^ ^^^'0' 

respectively. Next, we study the basic properties of the statistics (2.6) and 
(2.7), and provide strongly consistent estimators of J-phdF. 

3. Asymptotics. Let X C he a locally finite point set, that is, for 
any compact set K CM.'^, X r\K contains at most a finite number of points 
from X. Let C M*^ be a body with boundary dB. Define S~ {X ,dB) and 




Fig. 2. Inner (green) and outer (red) sewing of the polar rose (black) with equation 
p = |sin(59), based on 10* (left) and 10^ (right) i.i.d. points uniformly distributed on the 
square [—1,1]'^. 
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S'^{X ,dB) analogously to 5~(Af„,r) and S'^{Xn,T), respectively. For x G 
X, when x G S~ {X ,dB), we let ^~ {x,X ,B) be the normalized sum of the 
Hausdorff measures of the faces belonging to S~ {X ,dB) and containing x. 
lix(^S-{X,dB), then we put ^"(a;, Af, = 0. Thus, 

, , y n'^-Hf), iixe 11 V(/), 

0, otherwise. 

Similarly, we define ^'^{x,X,B) to be the normalized sum of the Hausdorff 
measures of the faces belonging to S^{X ,dB) and containing x] if no such 
face exists, then ^'^ is defined to be zero. 

We will use the functionals ^~ and ^'^ for several purposes; in particular, 
the statistics (2.6) and (2.7) can be expressed as the weighted sums 

n 

(3.1) /-(/i, T) = Y, h{Xk)C {Xk,Xn, G) 

k=l 

and 

n 

(3.2) I+{h,^)=Y,KXk)t{Xk,Xr^,G). 

k=l 

The functional is translation invariant, that is, for all x G ffi'^, all locally 
finite X and all bodies B, we have (,~{x, X, B) = (^~{0,X — x,B — x); here, 
is a point at the origin of R"' and for sets F CW^ and X G W^, we put 
{F — x} := {y — X : y £ F} . Also, given a > and putting aF := {ay '.y & F}, 
we have %'^~^{af) = a'^~^'H'^~^{f). Thus, ^~ satisfies the following scaling 
property for all r] > 0: 

(3.3) v''-^r{0,X,B)=r{0,V'^,vB), 
which, when combined with translation invariance, gives 

(3.4) ij''-'r{x,X,B)=r{0,v{'^-x)MB-x)). 
Thus, by the definition of I~{h,T), we have 

n 

(3.5) I- {h, T) = n-^"-^^/" ^ h{Xk)C (0, n^/\X^ - X^), n^'^G - Xk)). 

k=l 

Similarly, is translation invariant and satisfies the scaling property (3.4) 
and, therefore, 

n 

(3.6) /+(/i,r)=n"('^"i)/'^^/i(Xfc)e+(0,ni/'^(A'„-Xfc),ni/'^(G-Xfc)). 

fc=i 
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Central to our results is the asymptotic behavior of the summands in (3.5) 
and (3.6); see Lemma 7.1. For this, it is convenient to introduce the random 
variable (,{t), defined at (3.8) below. 

Denote by V the homogeneous Poisson point process of intensity 1 on 
and let ■.= V U {0}. In general, for any A > 0, we denote by Vx the 
homogeneous Poisson process of intensity X on Q. 

For all t G R, denote by the half-space 

(3.7) :=M'^"^ X (-oo,t]. 

The homogeneity of the Poisson point process implies, for all t £ M, that 
C~{0,V^,Mf) and C+(0, P", MIJ are equally distributed. 

Given i G M, denote by ^(t) the normalized sum of the Hausdorff measures 
of the faces incident to belonging to the inner sewing S~ (V^ jdHf). If there 
are no such faces, then ^ is set to be zero, that is, 

(3.8) m ■■= r (0, P° , Mf) ^ t{0, Hi,). 

Lemmas 7.2 and 7.3 imply that .^(•) is dominated by an integrable function, 
therefore, the constant 

/•CO 

(3.9) ad:= / nm]dt, 

Jo 

which plays an important role in the asymptotics of our method, is well 
defined. 

It is also meaningful to consider the Poissonized versions of /~(/i, F) and 
^rl (^,r)> namely, 

(3.10) X-(/i,F) := h{x)r{x,Vx,G) 
and 

(3.11) X+(/i,F) := h{x)t{x,Vx,G). 

The following theorems are our main technical results. They take into 
account the possibility that h can be discontinuous on F from either outside 
or inside G, which is not uncommon in applications, including, for example, 
the case when h denotes density of the body G. We say that h is inner 
continuous if the restriction of /i to G is continuous; likewise, we say that h 
is outer continuous if its restriction to the closure of G'^ is continuous. 

Theorem 3.1. Let F be the boundary of a compact set G C M.'^. Assume 
that T is a {d — 1) -rectifiable set, that it has finite Hausdorff measure and 
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coincides with the ^-boundary ofG, defined at (2.1). Ifh is inner continuous, 
we have 

lim E[X"(/i,r)]= lim E[I- (h,T)] = ad [ hdV 

X^oo n~>oo 



and 



lim X, {h,T) = lim /„ {h,r) = aa / hdV 

A^oo n-^oo Jy 



a.s., 



TOi/i Orf defined at (3.9). If h is outer continuous, then we have the same 
asymptotics for and Z+ . 

Theorem 3.2. Let T be as in Theorem 3.1. If h is inner continuous, 
we have 

lim \^'^^^^/^\ai[Z^{h,T)] = Vd [ h^dT, 

A->oo Jp 



where 

(3.12) Vd:= / K[C\t)]dt+ / / ct{z)dzdt, 

Jo Jo Jw 

with 

(3.13) ctiz):=nrio,v''u{z},mf)r{z,v''u{z},Mf)]-inm]?- 

If h is outer continuous, we have the same asymptotics for Var[X^(/i,r)]. 

We expect that modifications of the stabilization arguments in [[3], [19]- 
[21]] will yield a de-Poissonized version of Theorem 3.2, that is, variance 
asymptotics for I~{h,T). These involve nontrivial arguments and we post- 
pone this to a later paper. Roughly speaking, since Poisson input introduces 
more variability than binomial input, we expect the variances of I~{h,T) 
and /+(/i,r) to be no larger than the variances of I~{h,T) and I^{h,T), 
respectively. Consequently, we have good reasons to believe that, under 
the assumptions of Theorem 3.2, both Var[/~(/i,r)] and Var[/+(/i, F)] are 
0(?i~('^~^)/'^). We have not yet obtained analogous results for the rate of con- 
vergence for the bias (E[Z^ {h, T)] — adj^^ dV) . In this direction, we provide, 
in Section 4, some experimental results for dimension d = 2. 

In accordance with the assumptions of Theorems 3.1 and 3.2, we define 



Sn{h,T):-- 



I^{h,T), if h is inner continuous on G, 

but not outer continuous on G, 

I^{h,T), if h is outer continuous on G, 

but not inner continuous on G, 

^(/~(/i,r) + /+(/i,r)), if h is continuous everywhere. 
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In the light of our results and remarks, we propose to estimate the surface 
integral Jp h dT by the strongly consistent sewing-based estimator 

(3.14) a~^Sn{h,T). 

Since Theorems 3.1 and 3.2 only assume the rectifiability of F, the esti- 
mator (3.14) is applicable when the body G has sharp interior or exterior 
cusps. In particular, it can be used for estimating the boundary measure of 
such sets, an estimation problem in which previous methods [2, 8, 18] have 
drawbacks. Under suitable conditions on the smoothing parameter, [10] also 
discusses the consistency of the estimators based on the Minkowski content 
for a general class of bodies. 

4. Simulations. The estimator (3.14) depends on the constant Od defined 
at (3.9). This constant can be estimated by Monte Carlo methods, as follows. 
Consider large n and a surface such that Jpo dT^ is known. Let 1 denote 
the function identically equal to 1 and recall the definition of our estimator 
Sn- Then simulate a random sample of S'„(l,r'') and estimate ad by 

^ _ mean(5„(l,rO)) 

with m.ean{Sn{h,T^)) being the sample mean. Given ctd and an arbitrary 
surface F, the natural estimator of Jp hdT is thus 

(4.1) afSn{h,r). 

Taking this procedure into account, we estimated 02 by performing a 
simulation with 1000 independent copies of S'„(1,F''), with F*^ being the 
unit circle and with sewings based on n = 10^ points uniformly distributed 
on the square [—2,2]^. For this configuration, we obtained 

(4.2) 02 = 1.1820. 

To compare our estimator with those based on empirical approximation 
of the Minkowski content, we estimated the length L = 12^3 20.7846 of 
T, T being Catalan's trisectrix (also called the Tschirnhausen cubic), with 
polar equations 

^ f sec3(e/3), if0<^<^, 
^~ \sec3((27r-^)/3), if ^ < ^ < 2^. 

We used the same simulation framework as used in [8], where n points 
(n = 10,000 and n = 30,000) were drawn from the square [—9, 2] x [—5.5, 5.5] 
500 times. As mentioned above, the estimator proposed in [8] depends on 
the smoothing parameter e, which must be carefully chosen. Of the fourteen 
possible values of the smoothing parameter considered in [8] and of the 
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Table 1 

Mean and standard deviation of estimators based on the Minkowski content, 
with optimal smoothing parameter, and the sewing-based estimators over 500 
replications and simple sizes n — 10,000 and 71 = 30,000 (the true value is 

12^^20.7846; 



n 


mean(i^(r)) 


std(i^(r)) 


mean(if (T)) 


std(i^(r)) 


10,000 


19.3679 


1.0394 


20.7030 


0.3140 


30,000 


19.8237 


1.0666 


20.7328 


0.2375 



fourteen different corresponding estimators, we let (T) be the Minkowski 
content estimator having, on average, the smallest bias, that is, the one 
minimizing the difference between the expectation of the estimator and its 
target value. In Table 1, we compare L^^(r) with the sewing-based estimator 
L^{T), namely 

(4.3) L^^{T) = afSn{l,n 

Roughly speaking, we improve the mean relative error (difference between 
exact and approximate mean value/exact value) by a factor of 10"'^ and we 
significantly reduce the standard deviation. 

Now, consider the planar cardioid C with polar equation p = 1 + cos 9, 
< < 27r. We selected this curve since the sharp cusp at the origin pre- 
cludes using methods based on empirical approximation of the Minkowski 
content. Figure 3 shows how the inner and outer sewing practically over- 
lap the curve with samples of size 10^ of randomly distributed points on 
the square [—0.5,2.5] x [—1.5,1.5]. The target now is to estimate the length 
of the cardioid, which equals 8, using the sewing-based estimator L^{C). 

1.5 

1 

0.5 


-0.5 

-1 



-0.5 0.5 1 1.5 2 2.5 

Fig. 3. Inner (green) and outer (outer) sewing based on 10'^ uniform points overlapping 
a cardioid. 
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Table 2 

Mean, standard deviation and y/n-times mean square error of the 
sewing-based estimator Lf^iC), computed over 1000 independent 
replications 



n 


mean(if (C)) 


std(L^(C)) 


V^xMSE(i^(C)) 


1000 


7.8862 


0.1870 


1.5157 


10,000 


7.9446 


0.1009 


1.3254 


100,000 


7.9772 


0.0538 


1.0801 


1,000,000 


7.9885 


0.0323 


1.1736 



We performed 1000 independent copies of L^(C) for n = 10^, 10^, 10^ and 
10^. The results are summarized in Table 2 and Figure 4. The following 
important remarks can be drawn from this case study: 

1. As one might expect, the sharp inner cusp is slightly underestimated. The 
gain achieved by increasing the sample size is for both bias and standard 
deviation. 

2. The simulations strongly suggest that L^{C) is very close to normal. 
This result can be used to provide confidence intervals for the integral 
that we are estimating. For this, in real applications where there are no 
replicas of the estimator, bootstrap procedures can be used to estimate 
Yav[Sn{h,r)]. 

3. The mean square error scales with -y/n. Thus, the sewing-based estimator 
seems to converge much faster than estimators based on the empirical 
approximation of the Minkowski content, which, in general, we do not 
expect to converge faster than n^^^ [2]. 
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8.5 



8 



7.5 
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1 




+ 






















+ 



1000 10000 100000 1000000 



Fig. 4. Box plots of 1000 replications of the sewing-based estimator Ln{C), for n = 10^, 
10", 10^ and 10^. 
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Table 3 

Mean and standard deviation, computed over 1000 independent 
replications, of the sewing-based estimator of based onn= 10"* 
points uniformly distributed on the square [—0.5,2.5] x [—1.5,1.5] 



c 


HO 


mean(/((^)) 


std(7(C)) 


-1/2 


8.0000 


7.9446 


0.1009 





8.8858 


8.8786 


0.1138 


1 


13.3286 


13.3349 


0.1827 


2 


22.2144 


22.2191 


0.3470 



We finalize our simulation study by estimating line integrals of a para- 
metric family {/i^,C ^ ^} of scalar functions on the cardioid C given by 
p = l + cos 9,0< 9 < 2tt. We choose y) = p^"^-*^/^ for two reasons: 

(i) we have simple expressions for the integrals 

p /'27r 

1(C) = h(^dC = V2 {I + cos 6 f^'^dO, 
Jc Jo 

where we use dC = {p{e)Y + {p' [e)f dO = y^2^d9; 

(ii) the contribution to the total integral of the integral near the sharp inner 
cusp increases with and the variability of /i^ also increases with C. 

As before, we considered 1000 independent replications of the sewing- 
based estimator of /(C) based on n = 10^ lO'', 10^ 10^ points uniformly 
distributed on the square [—0.5,2.5] x [—1.5,1.5]. Similarly to the previous 
case study, increased sample size resulted in smaller bias and smaller stan- 
dard deviation. Also, the simulations strongly suggested that the estimator 
is very close to normal and that the mean square error scales with i/n. Our 
goal now is to highlight how the estimation depends on the parameter C- 
For this, we summarize the results for n = in Table 3. This behavior 
with respect to ( is common for all of the n values that we considered. We 
remark that, on the one hand, the bias is slightly smaller, to the extent that 
the sharp inner cusp has less of an effect on the value of the integral. On 
the other hand, the more the function h differs from being a constant (i.e., 
the case C = ~l/2), the more the variance of /((") grows. 

5. Application to image analysis using Google Earth data. Our simula- 
tions show that we require about 10^ points or more to get attractive esti- 
mators of the full image of a planar set. But they also show that 1000 points 
are enough to obtain good estimates of integrals along curves. This sample 
size is quite manageable for online applications for any new-generation per- 
sonal computer, on which a compiled version of our algorithm would run in 
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less than a second. Even n = 10^ could be implemented for snapshot queries. 
Next, we discuss an application to image analysis using Google Earth data. 
Quite possibly, Google provides a sharp image of the area around your res- 
idence or workplace, but what can you infer from the image of the most 
inaccessible regions of the planet, including, for example, the coast of the 
Aral Sea? 

According to the United Nations, the disappearance of the Aral Sea, once 
the fourth largest inland body of water on the planet, is the worst man-made 
environmental disaster of the 20th century. It is estimated that the size of the 
Aral Sea has fallen by more than 60% since the 1960s. Water withdrawal for 
irrigation has caused a dramatic fall in the water level, revealing a fascinating 
geology of cliffs overlooking the Aral Sea. What is the mean height of the 
cliffs along a long waterfront? How irregular are they? Google Earth provides 
elevation information for any pixel and, from its color, we can determine 
whether it is in the Aral Sea or not. This scenario fits our sampling model 
and thus we may use sewings to estimate the mean and variance of the 
elevations of cliffs along waterfronts. Both parameters are related to common 
topographic measures. According to [13], the standard deviation of elevation 
provides one of the most stable measures of the vertical variability of a 
topographic surface. On the other hand, the elevation mean is related to the 
elevation-relief ratio, namely 

elevation mean - elevation min. 
elevation max. - elevation min. ' 

which has been computed in the past using a point-sampling technique, 
rather than planimetry [22]. 

We focus our analysis on the land area bounded between latitudes 45.9 
and 46.04, and longitudes 58.9 and 59.27. We chose this surface, approxi- 
mately 445 km^, partly because of its complex shape, involving capes and 
bays. Figure 5 (top) shows the Google Earth image of the surface under 
consideration. 

We emphasize the three following important points: 

1. This surface is not a planar rectangle, but we proceed as though it were. 
The land area that concerns us is relatively small and therefore our statis- 
tics do not depend on whether we use Euclidean or geodesic distance. The 
error produced by considering land areas as planar sets is briefly discussed 
later. 

2. One of our basic assumptions is that the set to be estimated is compact 
and contained in the interior of the rectangle to be scanned. This is 
not the case now because the boundary between sea and land is not a 
loop contained by the scanned rectangle. In these cases, boundary effects 
can induce spurious long faces in the extremities of the inner and outer 
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Fig. 5. Google Earth image of the land area bounded between latitudes 45.9 and 46.04, 
and longitudes 58.9 and 59.27 (top); Delaunay triangulation for 1000 points uniformly 
distributed on the rectangle [58.9, 59.27] x [45.9, 46.04] and sewings of the Aral Sea (bottom). 

sewings. It is appropriate to exclude these two faces from each sewing. 
We remark that this procedure does not affect the result for large n. 
3. The fractal-like structure of coast lines is not detectable with the limited 
resolution of Google Earth and thus we may safely assume that the Aral 
Sea coast, even if naturally fractal-like, is a rectifiable curve. 

To summarize and clarify, we use the following procedure: drop 1000 points, 
uniformly distributed, on the longitude/latitude rectangle Q = [58.9, 59.27] x 
[45.9,46.04], obtain their corresponding elevation from Google Earth, deter- 
mine whether they are in the sea or not and, finally, compute the sewings 
of the sample. The results are shown in Figure 5 (bottom). In this context, 
distances between points are found as follows. Setting the Earth's radius to 
be 6371 km, we approximate the distance in kilometers between two lat- 
itude/longitude points (long]^, lati) and (long2,lat2) of Q by the spherical 
law of cosines: 

6371[cos~"'^(sin(lati) sin(lat2)) + cos(lati) cos(lat2) cos(long2 — long]^)]. 

Alternatively, we obtain the same numerical results using the appropriately 
scaled Euclidean distance or the Haversine formula. 

Let r denote the waterfront, that is, the curve contained in Q separating 
land from water in the image, and let L{r) denote the length of F. Let G 
represent the land and h(^x,y) denote the height in meters from the water 
of the latitude/longitude point (^x,y) G G. We are here assuming that the 
discrete data given by Google Earth can be extended to yield a continuous 
approximation of height at those points where there is not enough available 
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information. It is reasonable to assume that the restriction of /i to G is 
continuous, namely that h is inner continuous on G. 

We estimate the mean of the height of the cliffs along T, namely 

dV 



and the standard deviation 



r 




s,:= J j{h-hf — 



by 



' = 10.2787 m and W — , ' ' =9.8102 m, 

i-(i,r) Y /-(i,r) 

respectively. Note that s^, which certainly means a significant variance 
of the height of cliffs along the waterfront. 

We remark that one can compute the above estimators without knowing 
the value of ad- Using our estimation of this constant, the length of the 
coastline L{T) is estimated using (4.2) and (4.3), yielding 



2a2 
= 40.977 km. 

This length may be contrasted with the distance between the lower vertices 
of Q, which is approximately 28.6313 km. All of the above estimates could 
be easily implemented using Google Earth. This requires that the user enter 
the southwest and northeast coordinates, and that there is a rule (the pixel 
color, in our case) characterizing the set to be estimated. Of course, the larger 
the land area, the greater the differences between spherical and Euclidean 
measures. Thus, the larger the area, the worse the estimate. We should 
mention that more sophisticated applications, with which we could provide 
approximations of the estimation errors, using re-sampling, for example, 
could still be costly for online consultation with current processors. 



6. Conclusions. Random scanning of unknown bodies is a good alterna- 
tive to regular scanning when the underlying morphology is complex. The 
paper of Cuevas et al. [8] addressed the problem of estimating the bound- 
ary measure of a set for the former type of these scanning setups. For this 
sampling scheme, we introduce an efficient computational method for esti- 
mating not only the boundary measure, but also surface integrals of scalar 
functions, provided one can collect the function values at the sample points. 
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We discuss conditions for getting strong consistency, as well as some issues 
related to the rate of convergence of our estimators. Our proofs rely on 
point process methods, including weak convergence of point processes and 
stabilization methods. 

We perform a simulation study to compare our estimators with previous 
estimators of boundary lengths of sets, concluding that the sewing-based es- 
timators (4.1) significantly reduce the errors and computation times, while 
increasing precision. We complete our simulation study by estimating bound- 
ary lengths of sets with sharp cusps, as well as the integrals of scalar func- 
tions on the boundaries of these sets, always obtaining good results. 

An online application to image analysis using Google Earth data is dis- 
cussed. In particular, a complex waterfront of the Aral Sea, approximately 
41 km according to our estimators, is analyzed. Specifically, we estimate 
surface integrals related to the mean and standard deviation of the height 
of the irregular cliffs facing the sea line. 

7. Proofs and technical results. As indicated at the outset, our approach 
makes use of stabilizing functionals ^, where ^(x, X) is a translation invariant 
functional defined on pairs (x, X), where x G M*^ and A:" C M'^ is locally finite. 
Translation invariance means that ^(x, X) = ^{x + y,x + X) for all y G W^. 
We recall a few facts about such functionals [[3], [19]-[21]]. As in [19], we 
consider the following metric on the space C of locally finite subsets of W^: 

(7.1) D{A,A') := {max{K e'M:Ar\BK{0)=A'r\BK{0)})~\ 

We say that ^(•,-) stabilizes on a homogeneous Poisson point process V 
if, for all X G M*^, there is an a.s. finite R := R{x,V) such that 

e(x, V n Br{x)) = ^(x, [V n Br{x)] u A) 

for all A C {BR{x)y. Recall that V° ■=VU {0}. Now, whenever ^(•, •) sat- 
isfies stabilization, then is a continuity point for the function g{A) := 
^(0, A) with respect to the topology on C induced by D. Thus, by the contin- 
uous mapping theorem (Proposition A2.3V on page 394 of [11]), if 3^n, n>l, 

is a sequence of random point measures satisfying 3^„ as n — )• oo, then 

C(0,3^n) ^^(0,P°); see [19] for details. 

Thus, if Uk, k>l, are i.i.d. uniform on the unit cube with ZY„ := {Uk}^^i 
and if ^ is translation invariant so that £^{'n}/'^Ui,v}^'^lAn) = (,{0,n^^'^{V(n — 
Ui)), then, since the shifted and dilated random point measures n^^^{Un — 

Ui) satisfy the convergence n ^/'^{Un - Ui) -P", it follows from stabiliza- 
tion that, as n — )• oo. 



(7.2) 
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This result is central to proving the weak law of large numbers [21], 

n 

(7.3) n-iJ]^(ni/'^C/fc,nV'^Z^„)4E[e(0,P°)]. 
fc=i 

The analogous convergence result for the two-dimensional vector 

[e(ni/^C/fc, ni/^ZY„), e(ni/'^[/, , n^/'^Un)] , k^j, 

is likewise central to showing variance asymptotics and central limit theo- 
rems [3, 19] for the sums in (7.3). 

One of the main features of our proofs is that if ^ is stabilizing, if 3^„, re > 1, 
is a sequence of point processes on sets increasing up to the half-space H 

which is a translation/rotation of Hq := R"^"-^ x (— oo, 0] and if VnH 
as re — )• oo, then, subject to the proper scaling, there exist limit results for ^ 
analogous to those at (7.3). Our goal is to make these ideas precise. 

We prove our main results for the functional ^~ . Identical results hold for 
the functional ^'^ and the proofs for it are analogous. We henceforth simply 
write ^ for ^~ and I for I~ . 

To demonstrate the required asymptotics for (3.1) and (3.2), and to take 
advantage of the ideas just discussed, it is natural to parametrize points in 
Q as follows. First, assume without loss of generality that Q has Lebesgue 
measure equal to 1. Let M{G) be the medial axis of G, that is, the closure of 
the set of all points in G with more than one closest point on T. In general, 
M{G) is a nonregular (d — 1) -dimensional surface with null d-dimensional 
Lebesgue measure. We refer the reader to [7] for all matters concerning the 
theory of medial axes. 

Let Fq C r be the subset of F where there is no uniquely defined tangent 
plane. Then, ?^'^~^(Fo) = by assumption. Consider the subset Gi of G \ 
M{G) consisting of points x such that there is a 7 € F \ Fq with x — 7 
orthogonal to the tangent plane at 7 and 7 := 7(x) is the boundary point 
closest to X. Let Go be the complement of Gi with respect to G \ M(G). 

For ah t > 0, consider the level sets T{t) := {x G G\M(G) : d(x, F) = t}, 
where d{x,T) denotes the distance between x and F. Except possibly on a 
Lebesgue null subset of Gi, we may uniquely parameterize points x £ Gi 
as x{'y,t), where 7 is the boundary point closest to x and t := ||rE — 7||. Set 
F(0) =F. Let Fi(t) :=F(i)nGi and Fo(t) :=F(t)nGo. For each 7GF\Fo, 
let be the distance between 7 and M(G), measured along the orthogonal 
to the tangent plane to F at 7. Define D := sup^gp R^. 

For any integrable function (7 : Q — )• M, an application of the co-area for- 
mula (see Theorem 3.2.12 and Lemma 3.2.34 in [14]) for the distance function 
f{x) :=d(x,F) gives 

/ g{x)dx= [ [ g{x)n'^-^{dx)dt 
Jg Jo Jr{t) 
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n 



i/d 



(7.4) 



g{x) dx 



G 







+ 



ri(in-i/d) 



g 



{x{j,tn~^/'^))n'^~\dx) 



dt 



ro(in-i/d) 



g{x)n''-^{dx) 



dt. 



We may similarly write 



n 



(7.5) 



i/'^ / g{x) dx : 



r'j{tn-l/d) 



a 



{x{j,tn~^/^))n'^~\dx) 



dt 



+ 



g{x)n'^~\dx) 



dt. 



/ri(tn-i/d) 

where D\r[{-) and rQ(-) are now the analogs of L'',ri(-) and ro(-), respec- 
tively. 

To simplify the notation and to set the stage for developing the analog of 
(7.2), we define, for all x E Gi, the shifted and n^/'^-dilated binomial point 
measures 

(7.6) Vnh,t) := n^l\Xn-r - x(7, trT^I'')) U {0}, 
together with the shifted and dilated bodies 

(7.7) G„(7,t) := n^l\G - x{^, tn-^-^)). 

We similarly define the shifted and A^/'^-dilated Poisson point measures 

(7.8) Vx{l,t) := \^/\Vx - x(7,iA-^/'^)) U {0} 
and the shifted and dilated bodies 

(7.9) Gx{l.t):=\^/\G-x{^,t\-^l'')). 
More generally, for all x £ G, we write 

Vn{x) :=n^/\Xn-i-x)U{0} 

and similarly for Gn{x),V\{x) and Gx{x). 

Roughly speaking, the dilated bodies Gnij, t) are converging locally around 
to a half-space, whereas the restrictions of the point processes 
'Pniljt) to Gn(7,t) are converging to a homogeneous point process on this 
half-space [see (7.12)]. 

The next result, a consequence of this observation, is similar to (7.2) and is 
the key to all that follows. It shows that for all 7 G T \ Tq and t G (0, 00), the 
Hausdorff measure of faces incident to and belonging to the inner sewing 
<S~{T'n{'y,t),dGn{'y,t)) converges in distribution to the Hausdorff measure 
of faces incident to and belonging to the inner sewing S~ {V^ jdWLt). 
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Lemma 7.1. For a// 7 G T \ Fq and t G (0, 00), we have 

(7.10) e(o,p„(7,t),G„(7,t)) mrx{i,t),Gxh,t))^m- 

Proof. We will only prove the first result since the second follows by 
identical methods. Using a rotation if necessary, we will, without loss of 
generality, assume that the vector — (7, 0) is orthogonal to the boundary 
of the half-space WLf. 

For an arbitrary point set 3^ C M'^ and body D, let ^Biv^y^D) be the sum 
of the HausdorfF measures of the faces of the Delaunay triangulation of 3^ R D 
lying wholly inside D and incident to y G 3^. Recall that, by construction, 
the faces giving nonzero contribution to ^(0, 'P„(7, t), G„,(7, t)) have vertices 
belonging to G„(7,t). Since the boundary of G„(7,t) is differentiable, it 
follows for large enough n that these faces will eventually belong entirely to 
Gn(7,t). In other words, 

lim |e(0, P„(7, t), G„(7, t)) - ^5(0, Vn{l, t) D Gnil, t))\=0 a.s. 

n^oo 

(7.11) 

Since is at a distance t from the boundary of Gn^Jjt), it follows that 
the value of at with respect to ?^ n (M'^^-'^ n (— oo,t]) is determined 
by the points of ^ n (M'^-^ n {-00, t]) in a ball centered at and of radius 
R := max(t,i?o)) where Rq is the stabilization radius at for the graph of 
the standard Delaunay triangulation of U 0. Thus, reasoning exactly as in 
(7.1) and (7.2), we have that 7i D (R''"^ n (-00, t]) is a continuity point for 
the function g{A) := ^^(0,^) with respect to the topology on C induced by 
the metric D at (7.1). Since 

(7.12) p„(^,t)nG„(7,t) A^n(M^-in(-oo,t]), 

it therefore follows by the continuous mapping theorem that 

^^(0, Vnii, t) n G„,(7, t)) A ^b{o, n n (m^-^ n (-00, t])) = e(t). 

Combining this with (7.11), we have ^(0,P„(7,t),Gn(7,t)) — ^^ii), which 
completes the proof. □ 

Given (7.10), one has convergence of the corresponding expectations in 
(7.10), provided the random variables satisfy the customary uniform inte- 
grability condition. This is the content of the next lemma. 

Lemma 7.2. For a// 7 G F \ Fq and all t > 0, we have 

lim E[e(0,n,(7,i),G„(7,t))] = lim E[aO,Vx{l,t),Gx{l,m =nm]- 
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Proof. It is straightforward that the empty sphere criterion charac- 
terizing Delaunay triangulations (see, e.g., Chapter 4 of [24]) imphes that, 
uniformly in t, the volume of a Delaunay simplex in Vnij^t) has expo- 
nentially decaying tails, this being equivalent to the tail probability that a 
circumcircle (circumsphere in dimension greater than 2) contains no points 
from the binomial point process Vn{l,t). It follows that for p > 0, there is a 
constant C{p) such that 

(7.13) supsupE[|e(0,P„(7,t),G'„(7,i))n <C^(P), 

n (7,t) 

showing that 

{e(0,7'„(7,t),G„(7,t)),n>l} 

are uniformly integrable. By Lemma 7.1, we obtain the desired convergence 
of E[C (0, P„ (7, t ) , G„ (7, t ) )] . The convergence of E [^(0, ^ (7, * ) , Ga (7, * ) )] fol- 
lows using identical methods. □ 

The next two lemmas are also consequences of exponential decay of the 
volume of the circumspheres not containing points from 'n}^'^Xn- The next 
result shows that the expectations appearing in Lemma 7.2 are uniformly 
bounded, a technical result foreshadowing the upcoming use of the domi- 
nated convergence theorem. 

Lemma 7.3. There is an integrable function -F:[0,oo)— t- [0,oo), with 
exponentially decaying tails, such that 

sup sup E[^{0,Vn{x),Gn{x))]<F{t) 

" a;er{tn-Vd) 

and 

sup sup E[^{0,Vx{x),Gx{x))]<F{t). 

A a;er(tA-l/d) 

Proof. For all x G r(in-^/'^), let 

E{x) := {0 belongs to a face/ of a simplex in V{Vn{x))J n dUf / 0}. 

By definition, ^ vanishes on E^{x). It is easy to see that P[E{x)] has expo- 
nentially decaying tails in t, uniformly in n and 7. The first result follows by 
considering (^{0,Vn{x),Gn{x))l{E{x)), and applying the Cauchy-Schwarz 
inequality and the bound (7.13) with p = 2. The second result follows simi- 
larly. □ 

Our last lemma provides a high probability bound on the diameter of 
Delaunay simplices, a result which follows immediately from the exponential 
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decay of the volume of spheres not containing points from Xn or V\. In other 
words, making use of the bounds P[Vx n Br{x) = 0] = ex.p{—Xr'^Vd), where 
Vd is the volume of the unit radius d-dimensional ball, as well as the bounds 
P[Xn n Br{x) = 0] = (1 — r'^Vd)"', we obtain the fohowing result. 

Lemma 7.4. For any constant A > 0, there is a constant C > such 
that, with probability exceeding 1 — n~^, the diameter of all Delaunay sim- 
plices inV{Xn) is bounded by C(logn/n)^/'^. Thus, with probability exceeding 
1 — n~^, we have 



We now have all of the ingredients needed to prove Theorem 3.1. 

Proof of Theorem 3.1. The proof has two parts: the first shows 
expectation convergence and the second uses concentration inequalities to 
deduce the a.s. convergence. 

Part I. We use the above lemmas (binomial input) to first show that 
lim^^ooE [In (/i,r)] = ad J-p hdV for h inner continuous, recalling that we no- 
tationally simplify to ^ and I~ to /„ . We omit the proof of limA-j-oo E [X^ (/i, 
r)] = ad /p /idr as it follows verbatim, using instead the Poisson input ver- 
sions of the above lemmas. 

Note that ^{Xi,Xn, G) = if ^ G. Using (3.4) and conditioning on Xi, 



sup^(X,, Af„,G) < C(logn/n)('^-i)/'^ 



i<n 



and 



supaa:,7'A,G')<C(logA/A)(''-i)/'^. 



we have 



E[4(/i, r)] = nE[/i(Xi)e(Xi, G)] 

= ni/'^E[/i(Xi)^(0, n^'\X^ - X^),n^'''{G - X^))] 




Using the scaled volume identity (7.4) and putting, for all x £ G, 
Gn{x) := h{x)E[^{0, n^l\Xr._^ - x) U {0}, nV'^(G - x))\, 



we get that 



(7.14) 
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The proof of expectation convergence will be complete once we show that 
the two integrals in (7.14) converge to Jp h{j)'H'^~^ (d'y) and zero, respec- 
tively. 

We first consider the first integral in (7.14). For fixed t > and all n, there 
is an a.e. mapping /„ : F — )• Ti{tn~^^^) of F onto the level set Ti(tn~^^'^). 
To prepare for an application of the dominated convergence theorem, we 
next show, for each t > 0, that as n — )• oo, 

(7.15) / Gn{x)n''~\dx)^ [ h{j)E[mm''~\dj). 

JTiitn-^/'') JV 

To see this, write the difference of the integrals in (7.15) as the sum of 



(7.16) / Gnix^-'idx)- Gn{Ul)W-\d-t) 
and 

(7.17) j^Gn{Ui))n'~\d^)- j^K^)m{t)m'~\d^). 

By a change of variables, the difference (7.16) equals 

Gn{fn{i))[n^^\dU{i))-n'~\d^)]. 



By the a.e. smoothness of F, we have T-L'^ ^{dfn{'y)) = (1 + e„(7))?^'^ ^{dj) 
a.e., where 6^(7) goes to zero as n— )• 00. By (7.13), we have 

supsup|G„(/„(7))| <C(l)||/i||oo 

and so the difference (7.16) goes to zero. Next, consider the difference (7.17). 
Recalling that t is fixed, for all 7 G F we write fn{l) = x{-y,tn~^^'^). As 
n — )• 00, we have h{x{'y,tn~^^'^)) — t- /i(x(7,0)) = h{-y), by inner continuity 
of h. Combining this with Lemma 7.2, we get, for all x G Ti{tn~^^'^) with 
X = x{-y, tn~^/'^), that Gn{x) converges to /i(7)E[^(i)] and so, by the bounded 
convergence theorem, the difference (7.17) goes to zero as n — )• 00. Thus, 
(7.15) holds. 

By Lemma 7.3, we have, for fixed t > 0, 

(7.18) / Gn{x)n'^-^{dx) < ||/i||oo X sup(?^"'-i(F(tn-i/^))) x F{t). 

Jri(tn-i/d) n 

By the boundedness of h and ?^"'^^(F), the right-hand side of (7.18) is inte- 
grable in t and thus the dominated convergence theorem implies that 

/ / [h{x{-f,tn-^'''))nm.Vn{i,t),Gn{imw~\dx)dt 

Jo JT{tn-^/<i) 



24 



R. JIMENEZ AND J. E. YUKICH 



converges to 

roo r roo r 

/ / h{j)K[c{t)]n'^-\dx) dt = / E[^{t)]dt / h{-f)n'^-\d-f), 

Jo Jr Jo Jt 

as desired. 

Finally, we show that the second integral in (7.14) goes to zero as n — )• oo. 
For fixed t > 0, the inside integrals in (7.14) are also bounded by the right- 
hand side of (7.18). Since as n — 7- oo, the dominated 
convergence theorem gives that the second integral in (7.14) converges to 
zero. This completes the proof of expectation convergence. 

Part II. Next, we show a.s. convergence of In{h,T) and I\{h,T). We will 
do this by appealing to a variant of the Azuma-Hoeffding concentration 
inequality. We only prove the convergence of In{h,T) since the proof of the 
convergence of Ix{h,T) is identical. Let Ci be a positive constant. For all n, 
define the "thickened boundary" 

G{n) — {xeG: d{x, T) < Ci{\ogn/nf/'^}. 

By smoothness of F, it follows that v{n) := volume(G(n)) = 0((logn/n)^/'^). 
Define 

nv{n) 

(7.19) PniKT) := h{Xk)i{Xk,X^,^n).G), 

k=l 

where Xf^, k>l, belong to G{n), Xnv{n) '■= {^i-, ■ ■ ■ iXnv(n)}- Iii contrast 
to In(/i,F), note that I'n{h,T) contains a deterministic number of nonzero 
terms and is therefore more amenable to analysis. Our goal is to show that 
I'n{h,T) well approximates In{h,T) both a.s. and in L^, and then to obtain 
concentration results for I'n{h,T). The proof of a.s. convergence proceeds in 
the following four steps. 

Step (a). With high probability, the summands h{Xk)S,iXk, ^mG) con- 
tributing a nonzero contribution to /„(/i,F) arise when Xf^. belongs to the 
thickened boundary G{n). There are roughly nv{n) such summands and 
thus the convergence of In{h,T) may be obtained by restricting attention to 
the statistic I'n{h,T) defined at (7.19). Our first goal is to make this precise. 

The number of sample points in Af„ belonging to G{n) is a binomial 
random variable B{n,v{n)). Relabeling, we may, without loss of general- 
ity, assume that Xi, . . . , XB(^n,v{n)) belong to G{n), where we suppress the 
dependency of X/^ on n. 

Define 

B{n,v{n)) 

{n,v{n)) ) G), 

k=l 
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where XB(n,v{n)) '■= {^l, ■ ■ ■ , XB(n,v{n))} ■ 

By Lemma 7.4 and recalling the definition of G{n), if Ci is large enough, 
then, with high probability, the simplices defining the inner sewing of G 
belong to G{n) and thus it follows that for any constant A, there is a Ci 
large enough such that 

(7.20) P[in{h,r)^in{h,r)]<n~^. 

We will return to this bound in the sequel. 

Step (b). We next approximate i„(/i,r) by /'„(/i,r). Given In{h,T), we 
replace B{n,v{n)) by its mean, which we assume, without loss of generality, 
to be integral (otherwise, we use the integer part thereof). Observe that 

|/\(/i,r)-/„(/i,r)| 

is bounded by the product of four factors: 

(i) the difference between the cardinalities of the defining index sets, namely 
\B{n, v{n)) — nv{n)\; 

(ii) the maximal number N of summands affected by either deleting or 
inserting a single point into either XB{n,v{n)) or Xny(^n) 

(iii) SUPk<nv(n)^iXk G); 

(iv) the sup norm of h on G^ namely ||^||cxd* 

However, with high probability, we have these bounds: 

\B{n, v{n)) — nv{n)\ < C2 \og{nv{n)){nv{n))^^'^ ^ 
N ^ Cz^ogn (by Lemma 7.4) and, by the analog of Lemma 7.4, 
sup i{Xk,Xnv{n),G)+ sup ^(Xfc, , G) < C4(log n/n) (^"1)/'^. 

k<nv{n) k<B{n,v(n)) 

Here, and elsewhere, Ci,C2,... denote generic constants. We consequently 
obtain the high probability bound 

\i'n{h,T)-in{KT)\ 

(7.21) <C2[log(ni;(n))(™(n))i/2][C73logn][C4(logn/n)('^-i)/'^]||/i||oo 
<C5(logn)3n-('^-i)/2'^. 

Step (c). We combine (7.20) and (7.21), and take A large enough in (7.20) 
to get the high probability bound 

|/\(/i,r) -/„(/i,r)| < C6(logn)3n-('^"i)/2'^. 

Since |/'„(/i,r) — /„(/i,r)| is deterministically bounded by a multiple of n, 
it follows that 

E[|/\(/i,r)-/„(/i,r)|]^o, 
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whence 



E[Pnih,r)]^adJ^h{x)dT. 



It thus suffices to show that 

(7.22) \Pn{h,T)-K[i'n{h,r)]\^0 a.s. 

Step (d). We complete the proof by showing (7.22). We do this by using a 
variant of the Azuma-Hoeffding inequahty, due to Chalker et al. [6]. Write 
I{Xi, . . . ,Xnv{n)) instead of I'n{h,T). Consider the martingale difference 
representation 

nv{n) 

I{Xi,...,Xnv{n)) - E[/(Xi, . . . ,X„^,(„))] = ^ di, 

1=1 

where di := E[I{Xi, . . . , X„^(„))| J'i] - E[/(Xi, . . . , X„„(„))| here be- 

ing the fj-field generated by Xi, . . . , Xi. Observe that 

di := E[I{Xi, . . . , Xnv{n))\^i] - IE[/(Xi, . . .,X'i, . . . ,Xnv(n))\^i], 

where X'^ signals an independent copy of Xj. By the conditional Jensen 
inequality and Lemma 7.4, it follows that 

\di\ < E[\I{Xi, . . .,Xnv(n)) - H^l^- ■ , ^nv{n))\^i] 

<C7(logn/n)("'-i)/'^ 

holds on a high probability set, that is, for all A> 0, there is a Cj such that 
P[\di\ > C7(logn/n)('^-i)/'^] < n"^. If the {di) i were uniformly bounded in 
sup norm by o(l), then we could use the Azuma-Hoeffding inequality. Since 
this is not the case, we use the following variant (see Lemma 1 of [6]), valid 
for all positive scalars Wi,i > 1: 



P 

(7.23) 



nv{n) -| / \ 

< 2 exp ( I 



i=l 



>t 



nv{n) 

+ (l + 2t-^ sup \\di\\oo) Y P[\di\ > Wi 



i<nv{n) 



i=l 



Let Wi = C7(logn/n)('^-^)/'^. We have EZ^^^ = C8(logn)2-(Vrf)f^-(rf-i)/rf^ 
showing that the first term in (7.23) is summable in n. Since ||dj||oo < 
and > Wi\ < n~^, the second term in (7.23) is summable in n. Since t 

is arbitrary, this gives (7.22), as desired. □ 

Proof of Theorem 3.2. We will follow ideas given in [23], which also 
involves functionals whose expectations decay exponentially fast with the 
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distance to the boundary. For completeness, we provide the details, following 
in part [19] and [4]. 

Our goal is to show that 

hm A('^-i)/'^Var[/A(/i,r)] = Vd / h\^)'H''~\d-i) , 

A-s-oo Jy 

where I\{h,T) and are defined at (3.10) and (3.12), respectively. 
Let ix{x,Vx,G) ■.= i{\^l'^x,\^l'^Vx,\^/'^G). By scaling (3.4), we have 



Var[/A(/i,r)] = A-('^-^)/'^Var 



On the other hand, Campbell's theorem (see Chapter 13 of [12]) gives 



(7.24) 



where 



;^-(rf-i)/rfVar 



■ ■]h{x)h{y) dy dx 



IG 



+ \^''' / nil{^,'Px,G)]h\x)dx, 



G 



[■ ■ •] := nUx, Vx U y, G)Uy, Vx U x, G)] - nU^, Vx, G)]nUy,Vx,G)]. 
As in [19], in the double integral in (7.24), we put y = x + \~^^'^z, thus giving 

(7.25) 




IG 



■]h{x)h{y) dy dx 
X^l'^ I I Fx{z,x)h{x)h{x + X-^/'^z)dzdx, 



IG 



where 



Fa(z, x) := nU^.Vx U {x + \-^/''z],G)ix{x + A-i/^z,^ U x, G)\ 
- E[eA(x, Pa, G)]E[a(x + X-'l^z, Vx,G)] 

and where we adopt the convention that ^(x, 3^, G) is short for ^(x, 3^ U x, G) 
when X is not in y. By the definition of ^a and by translation invariance, 
we obtain that Fx{z,x) is equal to 

E[e(0, \^'\Vx - x) U {z}, \^'\G - xmz, X^'^Vx - x) U {0}, X^'^G - x))] 
- E[e(0, X^'\Vx - x), X^I\G - x))]E[e(z, X^'^Vx - x), X^I\G - x))]. 
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Write X := x(7,tA~^/'') and recall the definitions of V\{'^,t) and G\{'~f,t) 
from (7.8) and (7.9), respectively, so that the above becomes 

Fxiz,x) = E[e(0,PA(7,t) U {z}, X'/'^iG - x))e(z,n(7,t), X'/\G - x))] 

- E[ao, Pa (7, t),X'/\G - x))miz, Pa (7, t), X'^\G - x))]. 

Recalling that ^B(y,3^n D) is the normalized Hausdorff measure of the 
faces of the Delaunay triangulation of 3^ n D lying inside D and incident to 
y, the above becomes 

Fx{z, x) = E[Cb(0, [Pa (7, t) U {z]\ n Ga(7, msiz, [Pa (7, t) U {z}] n Ga(7, t))\ 
- E[Cij(0,PA(7, t) n GxiimniBiz, [Pa (7, t) U {z}] n Ga(7, t))]. 
Next, we have a two-dimensional version of (7.10), namely 

[^^(0, [PA(7,t) U {z}] n Gx{l.t)),iB{z, [PA(7,i) U {z}] n Gxiim 

A [^^(0, [P° u {z}] n ),eB(^, [P° u {z}] n Mf)], 

from which it follows from uniform integrabihty that as A — ?• 00, 

(7.26) Fx{z,x)^ct{z), 

where ct{z) is as in (3.13). 

We now find the large A behavior of the integrals at (7.25). Recalling the 
scaled volume identity (7.4) and recalling that x = x{'~f,tX~^^'^), we get, after 
substitution, that 

^{d+i)/d I I [...]h{x)h{y)dydx 

Jg JRd. 

/ Jxiz,t,j)dzV''~^ 



(7.27) = / / / Jxiz,t,j)dzn" 

Jo Jri(tA-i/d) JKd 



(dx) dt 



+ / Jxiz,t,j)dzn'^^\dx)dt, 

Jo JVoitX-'^/'i) jRd 

where 

Jx{z, t, 7) := Fx {z, x{j, a- V'^))/i(x(7, tA" V'^))/i(x(7, tX-^") + A-^/'^z) . 

Notice that Jx{z,t,'y) is dominated uniformly in A by a function F{z,t,j) 
decaying exponentially fast in |^;| and t. By the a.e. continuity of h and the 
convergence (7.26), the integrand in the first integral tends to ct{z)h'^{'y) as 
A — )■ 00. Bounding the integrand by ||/i||^F(2;, t, 7) and applying dominated 
convergence, we obtain that as A — )• 00, the first integral in (7.27) converges 
to 

(7.28) / / / h^{j)ct{z)dzn'^-\d-f)dt. 
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The second integral in (7.27) converges to zero, by same methods used to 
show that the second integral in (7.14) goes to zero. Indeed, 



/ro(tA-i/d) 

is bounded by an integrable function of t which is going to zero as A — )• oo 
since J^d Jx{z,t,^) dz are bounded uniformly in 7 and t, and H'^~Hro(tA"i/'^)) 
0. 

On the other hand, the single integral at (7.24) satisfies the identity 

X'/^ [ E[Cl{x,Vx,G)]h\x)dx = X'/^ [ E[Cl{x,Vx,G)]h\x)dx, 
Jg Jg 

which, as A — )■ 00, tends to 

/■oo 

(7.29) / / E[e,'^{t)]dth^{-i)'H'^-^{d-i)dt. 

Jo Jr 

Combining (7.28) and (7.29), and recalling the definition of at (3.12), we 
obtain 

lim X^'^'^y'^YRT[Ix{h,r)] = Vd [ h\j)n'^~\d^). 
This completes the proof of Theorem 3.2. □ 
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